Co-Reference Resolution for the Indonesian Language Using Association Rules

نویسندگان

  • Indra Budi
  • Stéphane Bressan
  • Nasrullah
چکیده

Abstract In this paper, we proposed a co-reference resolution method for texts in the Indonesian language. The objective of co-reference resolution is to identify equivalence between entities as well as between pronouns and entities that were recognized in a named entity recognition phase. We propose a method that uses association rules. The method combines several features, such as pronoun and name classes, string similarity and position in the text, into a vector of attributes. Applied to a corpus of newspaper articles in the Indonesian language, the method yields an FMeasure of 84.12%. we compare the result to one of state-of-the-art machine learning method for co-reference resolution, decision tree, and the result is comparable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language

In this paper, we propose a new method, association rules mining for Named Entity Recognition (NER) and co-reference resolution. The method uses several morphological and lexical features such as Pronoun Class (PC) and Name Class (NC), String Similarity (SP) and Position (P) in the text, into a vector of attributes. Applied to a corpus of newspaper in the Indonesian language, the method outperf...

متن کامل

Naturalization in Translation:A Case Study on the Translation of English-Indonesian Medical Terms

Naturalization is a translation procedure that is predominantly utilized in the translation of English medical terms into Indonesian. This study focuses on identifying types of naturalization involving the adjustment of spelling and pronunciation and investigating whether naturalization has been appropriately applied based on the rules in the Indonesian general guidance of term formation. The d...

متن کامل

Naturalization in Translation:A Case Study on the Translation of English-Indonesian Medical Terms

Naturalization is a translation procedure that is predominantly utilized in the translation of English medical terms into Indonesian. This study focuses on identifying types of naturalization involving the adjustment of spelling and pronunciation and investigating whether naturalization has been appropriately applied based on the rules in the Indonesian general guidance of term formation. The d...

متن کامل

A rule based solution to co-reference resolution in clinical text

OBJECTIVE To build an effective co-reference resolution system tailored to the biomedical domain. METHODS Experimental materials used in this study were provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves co-reference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each doc...

متن کامل

A Two-Level Morphological Analyser for the Indonesian Language

This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006